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ABSTRACT 

A historical perspective on and substantive review of 
the concept of invariance are provided. Progress made toward solving 
measurement problems related to invariance is also assessed. Two 
major classes of invariant measurement are described: (1) 
sample-invariant item calibration; and (2) item-invariant measurement 
of individuals. The work of S, S, Steven' is used to help clarify the 
concept of invariance- The importance of invariance as a key 
measurement concept is then illustrated via the measurement theories 
of E, L, Thorndike, L. L, Thurstone, and G, Rasch, The study 
methodology uses quotations and original figures to illustrate how 
these researchers addressed measurement problems related to 
invariance, A comparison and discussion of these three researchers' 
theories of measurement are presented in terms of their contributions 
to the solution of problems related to the concept of invariance, 
Rasch 's research is seen as the means by which the issues raised by 
the other two researchers were resolved, A case is made for viewing 
invariance as a fundamental aspect of measurement in the behavioral 
sciences, Invariance appears to be essential in order to realize the 
advantages of objective measurement, A 58-item list of references, 
one table, and five figures are included- (TJH) 
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Abstract 

The purpose of this study is to provicJe a historical 
perspective on the concept of invariance. Two major classes of 
invariant measurement are described — sanple-invariant item 
calibration and item-invariant measurenent of individuals. The 
work of Stevens is used to help clarify the concept of invariance. 
The iirportcUice of invariance as a key neasuremsnt concept is then 
illustrated with the measurement theories of Thomdike, Thurstone 
and Rasch. A case is made for viewing invariance as a fundamental 
aspect of measurement in the behavioral sciences; invariance 
appears to be essential in order to realize the advantages of 
objective measurement. 
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HISTORICAL VIEWS 0? INVARIANCE: EVIDENCE FROM 
THE MEASUREMENT THEORIES OF TKORNDIKE, THORSTONE AND RASCH 
The history of science is the history 
of measurement (Cattell, 1893, p. 316) 
The scientist is usually looking for 
invariance vdiether he knows it or not 
(Stevens, 1951, p. 20) 
Invariance has been identified as a fundamental aspect of 
measurement in the behavioral sciences (Andrich, 1988a; Bock & 
Jones, 1968; Jones, 1960; Stevens, 1951). In essence, the goal of 
invariant measurement has been succinctly stated by Stevens: "the 
scientist seeks measures that will stay put viiile his back is 
turned" (1951, p. 21). The concept of invariance has inplications 
for both item calibration and the measurement of individuals. 

Many of the measurement problems that confront us in 
psychology and education today, such as those related to 
invariance, are not new. By taking a historical perspective on 
these measurement problems, we can increase our mderstanding of 
the measursnent problems themselves, assess the adequacy of 
solutions proposed by major measurement theorists and identify 
promising areas for future research. Progress, and in some cases 
lack of progress, towards the solution of basic measurement 
problems can also be meaningfully documented. 
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During the 20th century, there have been two major research 
traditions vdiich have guicied measurement theorists attenpting to 
quantify various human characteristics, such as abilities, 
aptitiides and attitudes. One tradition has its roots in the 
psycbjometric work of diaries Spearman (1904); this research 
tradition is focused on the test score and is primarily concerned 
with measurement error and the deconposition of an observed test 
score into several conponents including a "true" score and variovis 
error conponents. This research tradition wittiin msntal test 
theory can be labelled "classical test theory". A second research 
tradition which has developed in a parallel fashion has its roots 
in the 19th century work in psychophysics and has continued into 
present practice through the' various forms of latent trait theory 
or more specifically item response theory (IRT). This second 
research tradition will be referred to as "scalir^ theory" . The 
focus of research within this second tradition is on the 
calibration of both individuals and items onto a latent variable 
scale. Within these two research traditions, classical test theory 
and scaling theory, there are several dominant perspectives that 
have evolved over time. For exairple, Spearman's research on 
classical test theory has been extended through generalizability 
tlieory (Brennan, 1983; Cronbach, Gleser, Nanda & Rajaratnam, 1972; 
Shavelson, Webb, & Rowley, 1989) , as well as the LISREL models 
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cJeveloped by Karl Joreskog ( Joreskog & Sorbom, 1986) . This p^r 
examines progt'ess within the second n^easurement tradition of 
scaling theory due to the contributions of Thomdike/ Thiorstone and 
Hasch; measurement perspectives within classical test theory will 
not be addressed in detail here. 

A great deal of educational and psychological research has 
been coiidacted within the framework of classical test theory and 
enpirical research workers routinely include "coefficient alphas" 
or "KR-20S" for the instruments used in their studies. Along with 
this concern for "reliability" coefficients, research workers have 
also worried about the validity of their instruments, althoto^ 
documenting what a test score really represents is rarely resolved 
in most studies and may ultimately be the mast inportant research 
question of all. Instead of focusing on measurement problems 
related to reliability and validity which are the central concepts 
of classical test theory (Lcevinger, 1957), this study focuses on 
measurement problems related to the concept of invariance vAiich 
appear clearly within scaling theory; this is to say that the 
concepts of reliability or especially validity are unimportant, 
rather that different research traditions focus on different 
aspects of the measurement problems encoimtered in the behavioral 
sciences. In fact, invariance has inpDrtant relationships to and 
inplications for issues related to reliability and validity, and is 
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essential for gaining a clear understanding of certain persistent 
problems encountered in classical test theory. As pointed out by 
Jones and i^jpelbax:un (1989) , developuents in item response theory 
have led to constructive changes in psychological testing and the 
"primary advantage of IRT over classical test theory resides in 
properties of invariance" (p. 24) . 
Purpose 

The purpose of this paper is to provide a historical 
persi^ective on the concept of invariance. Several enduring 
measurement problems related to item calibration and the 
measurement of individuals can be meaningfully viewed using the 
concept of invariance. The measurement theories of Thomdike, 
Thurstone and Rasch are used becaxosa they address measurement 
problems related to the concept of invariance and proposed 
solutions to these problems. These measurement theorists also 
share a common research tradition based on scaling theory. 
Method 

Quotations and original figures v*ien available are used to 
illustrate how Thomdike, Thurstone and Rasch addressed measurement 
problems related to invariance. Although there are quantitative 
aspects to the ^roaches used to address invariance, it is beyond 
the scope of this paper to provide detailed derivations of the 
equations losed by each theorist to achieve sample-invariant item 
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calibration and Ite^inv^iant ,«sur««.t of Inaivltois. Ihe^ 
aerlvatlons are presented In Engelhard (1964) for ..as^en«>t 

related to san^l^lnvarlant ite™ calibration; a parallel 
analysis can also be develoi«a for is^s related to the ite^ 
invariant „«asure,«.t of indlvitols, hat is not included here, 
in the ne« sectia. of this paper, the concept of invariance 

is defined and argtmients are presented fr,^ ■ 

= presented for its mportance as a key 

idea in n.a-.are»^t. A descriptio, of the ^^.t theories of 
ao-dite. Itarstone and Kasch is presented r^, the role of 
invariance in each of these theories is also examined. Ne«, a 
o-parison and discussion of these three theories of ^asuranent is 
presented in te^s of their »ntrib.tions to the solution o, 
problem related to the conc^t of invariance. ,he final section 
-l>*s a Of the ^Jor points of this pa^r, as .«11 as 

suggestions for additional research in this area. 

THE CONCEPT OP DWARlaNCE 

Within the behavioral sciences .s « 

=>^ieiices, b. s. Stevens (1951) has 

presented one of the strongest cases fr.^ 

'Wsst: cases for the general iiijDortance of 

the concept of invariance. In his chapter on "Mathenatics 
Measuren^t and Psychophysics", *ich appeared in the Ssdboo!^ 

SSSr»JMsi, Stevens describes the role of this concept 

in mathematics and Dhvsirc; an/i 

ana pnysics, and he argued that "many psychological 

prt^blems are already conceived as the deliberate search for 
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invariancer" (p. 20), In fact, Stevens cJefined the whole field of 
science in terms of a quest for invariance and the concomitant 
generalizability of results. In his words, 

The scientist is xisually looking for invariance viiether 
he knows it or not. Whene\^r he discovers a functional 
relationship his next question follows naturally: under 
vdiat conditions does it hold? • . , The quest for invariant 
relations is essentially the aspiration toward generality, 
and in psychology, as in physics, the principles that have 
wide plications are those we prize- 

(Stevens, 1951, p. 20) 
Applying this view of invariance more specifically to 
measurement issues, Stevens used the concept of invariance to 
define his familiar scales of measurement ~ nominal, ordinal, 
interval and ratio scales (Stevens, 1946); in his words. 

Each of the four classes of scales is best characterized by 
its range of invariance — by the kinds of transformations 
that leave the "structure" of the scale undistorted. And 
the nature of invariance sets limits to the kinds of 
statistical manipulations that can be legitimately applied to 
the scaled data. (Stevens, 1951, p. 23) 
Influenced by the insightful work of Mosier (1940, 1941), Stevens 
pointed out the symmetry betweeji the fields of psychoph^^sics and 

ERiC L'C 9 
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psychometrics as related to the concept of invariance: 

Psychophysics sees the response as an indicator of an 
attribute of the individual — an attribute that varies 
with the stimulus and is relatively invariant from person 
to person. Psychometrics regards the response as 
indicative of an attribute that varies from person to person 
but is relatively invariant for different stimuli. Both 
psychophysics and psychometrics make it their business to 
display the conditions and limits of these invariances. 

(Stevens, 1951, p. si) 
The first sentence in this quotation illustrates the idea of 
sample-invariant item calihr^i-inn vdiile the second sentence points 
to the idea of item-invar iant measurement of individuals . This 
duality between psychophysics and psychometrics, vdiich was clearly 
described by Mosier (1940, 1941) and pointed out even earlier by 
Guilford (1936), represents one of the five major ideas underlying 
test theory identified by Lumsden (1976) . Measu ement problems 
related to invariance can be meaningfully viewed in terms of these 
two broad classes — sanple-invariant item calibration and item- 
invariant measurensnt of individuals. 

Within each of these two classes, invariance over methods and 
conditions can be examined. Methods refer to the statistical 
procedures and models, including the method used to collect the 
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data, used within the measuremiixt theory • For esxample, paired 
conparison and successive interval scaling would represent 
different methods of data collection, as well as reqinre different 
statistical models. Conditions can arefer to either subgroupings 
of items and/or examinees. For example, test equating is concerned 
with the development of procedures which yield conparable estimates 
of an individual's ability viiich are invariant over the subgroups 
of items (tests) which are used to obtain these ability estimates. 
As another example, the research on item bias or differential item 
functioning as it has come to be labelled, reflects a concern with 
whether or r^c'c the meaning of an individual's responses on a 
particular test item vary as a function of irrelevant factors 
related to membership in varioos social categories, such ai 
gender, race and social class giroups. 
Sample- invariant item calibration 

The basic measurement problem underlying sample- invariant item 
calibration is how to minimise the influence of arbitrary sanples 
of individuals on the estimation of item scale values. For 
exanple, Engelhard (1984) found that Thomdike provided a single 
adjustment (location) for differences in grov^) characteristics, 
while Thurstone provided for two adjustments (location and scale). 
Rasch's approach to sanple-invariant calibration can be viewed as 
providing three adjustments (location, scale and an individual 

^ 1 -J 
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level response . j,,,,, ^ ^ ^ 

"a, scali^ ^ 
saiple-invariant item calibrations. 

^ over^l goal of sa^le-invariant calibration of Iten. is 
to esti^te ^ location o, it«. on a latent variable o, interest 
^a. Will re^ ^ ^^^^^^ ^ 

also across various sub^.^ ^ 

of sa^l^i:^!^ calibration is achieve., then the ite. scales 

-lues Will ^t be a function of s^ characteristics, such as 

ability level, gender ran<= =r,^ , 

S^r, race and social class. Further, the 

calibration o, the ite»s should also ^ in^i3nt over subsets of 
iten.. so that if we are aevelopin, a calibr^tea iten, b=n., the 
scale values of the ite^ are not affected b. the inclusion or 
exclusion of other items in the bank. 
Item-invarian t iceasur^,^.. „^ i„flh-n„ i1_ 

taking new to item-invariant meaeuren^t, the basic 
«eas««^t problem involves minimising the influence o, the 
P^ticular items .*ich ha^ to be used to estimate an 
indivia^.s ability. p.^,^ ,3 ^^^^ ^ ^ 

and eating of test scores, as »ll as the scoring of each 
indivia.1.3 t«rfoo^ce. solutions to this problem usually 
i-l-e adiust^ts for item ctoacteristics ,ite„ difficulty, and 
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test characteristics (location, dispersion and shape of item 
distributions on the latent variable scale) . The overall objective 
is to obtain conparable estimtes of individual ability regardless 
of vdiich items are included in the test. This is essentially the 
problem of equatir^ person measurements obtained on tests conposed 
of different items (Engelhard & Osberg, 1983) . Invariance over 
scoring method also requires attention. In addition to considering 
invariance over methods, it is inportant to consider invariance 
over conditions within this context; an individual's score should 
not depend on the scores of other individuals being tested at the 
same time. 

In summary, invariance can be viewed as an irpDrtant general 
concept in the physical and behavioral sciences, as well as a key 
aspect of successful measurement in the behavioral sciences. As 
pointed out by Bock and Jones (1968), "in a well-developed science, 
measurement can be made to yield invariant results over a variety 
of measurement methods and over a range of experimental conditions 
for any one method" (p. 9). In outline form, this can be 
simimarized as follows: 
Classes of Invariant Measurement 
I. Sanple-invariant item calibration 

A. Invariance over methods (statistical procedures/models) 

B. Invariance over conditions (groups of individuals/ items) 

'vi 13 
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II. Item-invariant measuremsnt of indivicJuals 

A. Invariance over methods (statistical procedures/models) 

B. Invariance over conditions {groins of individuals/items) 

THREE MEASUREMENT THEORIES AJJD INVARIANT MEASUREMENT 
The purpose of this section is to describe and illustrate how 
the concept of invariance emerged within the measurement theories 
of Thomdike, Thurstone, and Rasch. Since the clearest statement 
of the conditions necessary to acconplish invariance are presented 
in the measurement theory of Rasch, I will begin with his research 
and then trace the adumbrations of these ideas within the work of 
Thurstone and Thomdike. I should also point out that all three of 
these theorists wrote extensively on various measurement problems, 
and for Thomdike especially it was sometimes difficult to point to 
one consistent set of principles that defined his definitive 
"theory of measurement". In order to address this issue, I have 
explicitly ci+sd certain texts and it should be understood that I 
am using these to define a particular individual's "measurejnent 
theory" . This was not much of a problem for Rasch because he was 
very consistent in his views related to invariance; Thurstone was 
fairly consistent, vdiile Thomdike was the least consistent of the 
three. 
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Rasch 

Based on psychometric research conducted during the 1950s, 
Rasch (1980/1960, 1961, 1966a, 1966b) presented a set of ideas and 
methods vdiich were de^scribed by Loevinger (1965) as a "truly new 
approach to psychometric problems" (p. 151) viiich can lead to 
"nonarbitrary measures" (p. 151). One of the major characteristics 
of this "new approach" was Rasch 's erolicit concern with the 
development of "individual-centered tediniques" as opposed to the 
group-based measurement models used by measurement theorists such 
as Thomdike and Thur stone. In Rasch 's words, "individual- 
centered statistical techniques require models in vftiich each 
individual is characterized separately and from vAiich, given 
adequate data, the individual parameters can be estimated" 
(1980/1960, p. XX). 

Problems related to invariance play an inportant role in 
motivating the measurement theory of Rasch. As pointed out by 
Andrich (1988a), Rasch presented "two principles of invariance for 
making coirparisons that in an inportant se^ise precede, thou^ 
inevitably lead to, measurement" (p. 18). Rasch's concept of 
"specific objectivity " viiich he formulated in terms of his 
principles of coirparison form his version of the goals of 
invariant measureinent (Rasch, 1S77). In Rasch' s words. 
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The conparison between two stimuli should be independent of 
vdiich particular individuals were instrumental for the 
conparison; and it should also be independent of vdiich 
stimuli within the considered class were or mi^t also have 
been coiEpared. Symmetrically, a comparison between two 
individuals should be independent of vdiich particular stimuli 
within the class considered were instrunental for the 
conparison; and it should also be independent of vfiiich other 
individuals were also conpared, on the same or on some other 
occasion (Rasch, 1961, pp. 331-332). 
It is clear in this quotation that Rasch recognized the inportance 
of both sample-invariant item calibration and item-invariant 
measurement of individuals. In fact, lae made them the cornerstones 
of his quest for "specific objectivity". In order to address 
problems related to invariance, Rasch laid the fomidation for the 
development of a "family of roeasurenent models" vAiich are 
characterized by separability of item and person parameters. 
(Masters & Wri^t, 1984). 

Rasdi's ^proach to sample- invariant item calibration involved 
the comparison of item difficulties obtained in separate groups. 
Tn his words, 

In relation to attainment tests all the school grades for 
which the tests are in practice applicable may be considered 
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as forming a total collection of persons, that may be divided 
into subpopulations, such as single grades, sex grox^ and age 
groups within a grade, social strata, etc. Between the test 
results in such more or less extensive grcxrpc. the sane 
fundamental relationship must hold, and if so we shall use the 
term that the relationship is "relatively independent of 
population", the qualification "relatively" pointing to the 
degree of brealodown that has been applied to the data. 

{Rasch, 1980/1960, p. 9) 
In his book, he vised ability groups formed on the basis of raw 
scores. In essence, Rasch was "looking for trouble in a nore or 
less definite direction, namely, for the possibility that the 
relative difficulties of the tests may vary with [raw score] that 
is, with the reading inability of the children" {Rasch, 1961, p. 
323). This "test of fit" or vAiat Rasch referred to as "control of 
the model" was presented graphically. In order to illustrate this 
idea, the results for two subtests, N and F, from the Danish 
Military Groip Intelligence Test (BPP) v^ich were used by Rasch 
(1980/1960) are presented in Figure 1. The test data were obtained 



Insert Figure 1 about here 
from 1,904 recruits who were tested in September 1953. The results 
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for Subtest N are presented in Panel A (Rasch, 1980/1960, p. 89) 
viiich illustrates successful sanple- invariant item calibration. 
The horizontal axis is based on the average of the separate within 
group calibrations. The parallel lines indicate that the 
difficulty of the items are relatively invariant across raw-scora 
groups. Iftisuccessful sanple-invariant item calibrations are 
presented in Panel B for Subtest F (Rasch, 1980/1960, p. 98) and is 
reflected in the non-parallel lines with different slopes. 

Due xo the formal symmetry in Rasch's rocdel between items and 
individuals, he used a similar graphic approach to examine whether 
or not item-invariant measurement of individuals had been achieved. 
The results for Subtests N and F are presented in Figure 2 vdiich 



Insert Figure 2 about here 

are also reproduced from Rasch (1980/1960). Panel A (Rasch, 
1980/1960, p. 87) illustrates successful item-invariant measurement 
with ability estimates relatively invariant over item groi:ips, vdiile 
Panel B (Rasch, 1980/1960, p. 97) provides evidence of unsuccessful 
item- invariant measurement as evidenced by the inequality of the 
slopes based on the regressicMi of ability estimates obtained 
separately within each item group on the total. 

18 
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Even thou^ there are more sophisticated methods for examining 
invariance xising statistical tests of item and person fit (Wri^t, 
1988; Wri^t & Stone, 1979), the graphical methods clearly show 
whether or not invariance has been achieved. As will be seen in 
the next section, Thurstone used a similar gr^ihical method to 
examine vdiether or not his method of absolute scaling was 
appropriate for a particular set of test data. 

By focusing on the individual as the le^/el of analysis, Rasch 
was able to examine test data and identify vdien invariance was 
exhibited. When the data fit the Rasch model, such as with Subtest 
N, then the types of invariance which eluded research workers in 
the classical test theory tradition can be obtained. To quote 
Loevinger, 

Rasch is concerned with a different and more rigorous kind of 
generalization than Cronbach, Rajaratnam and Gleser. When his 
model fits, the results are independent of the sample of 
persons and of the particular items with some broad limits. 
Within these limits, generality is, one mi^t say, conplete. 

(Loevinger, 1965, p. 151) 
Detailed descriptions of Rasch measurement are presented in Wri^t 
and Stone (1979), Wright and Masters (1982) and Wri^t (1988). 
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Thurstone 

Thurstone also recognized the in^xDrtant of invariant 
measurement. In fact, as pointed out by Bock and Jones 
(1968) , "in the system of psychological measurement based on the 
Thurstonian models, we achieve some of the invariance in 
measurement v*iich is characteristic of the other sciences" (p. 9). 
In developing his method of absolute scaling {1925, 1927, 1928a, 
1928b) for calibrating test items, he was specifically motivated by 
the lack of sanple-invariance he observed in Thomdike's scaling 
method. In his words, 

the probable error, or PE [used in Thomdike's method], is not 
valid as a unit of measurement for educational scales. Its 
defect consists in that it does not possess the one 
requirement of a unit of measurement, namely constancy 
[enphasis added]. It fluctuates from one age to another. 

(Thurstone, 1927, p. 505) 
Thurstone 's concept of constancy is his version of an invariance 
condition and is an e:^licit consequence of measurement situations 
that yield objective measurements. Thomdike's PE values fluctuate 
because the item scale values are not sample-invariant viiich 
violates Thurstone 's insist that the "scale value of an item 
should be the same no matter vAaich age group is used in the 
standardization" (Thurstone, 1928a, p. 119). 

20 
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As did Rasch, Thurstone tised the idea of a ccaitiraium to 
represent the latent variable of interest and assumed that items 
can be placed at points on this linear scale which would have a 
fixed position regardless of the groi^) being tested. In order to 
illustrate this idea, Thurstone presented two figures which are 
reproduced in Figure 3. The first figure presented in Panel A 
(Thurstone, 1925, p. 437) shows the location of an item (open 
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circle on the base line) v*iich has a fixed position regardless of 
the distribution of abilities on this latent continuum for grot:^ A 
and B. According to Thurstone, "if any partiailar test item or 
particular raw score is to be allocated on the absolute scale, its 
scale value should be ideally the same vAiether determined by grcmp 
one or groi^ two" (1925, p. 438). In a second figure, shown in 
Panel B of Figure 3 (Thurstone, 1927, p. 509), shows the loca-»"icai 
of 7 items (a to g) and again presented the idea that the 
calibration of these items should be invariant over grca:5)s A and 
B. 

In order to adj\ist for differences in the location and 
variability of two or more distributions, Thurstone assumed a 
normal distribution of ability for each groi^ and essentially 
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adjusted statistically for diffo>, 

y for cJaf fereiices in location and scale m 

order for these adjust^,3 p^^sed by ^^3tone t 

lead to sa^ae-invariant ite™ v. ^ "^"^'^^ ^° ^^^-^Iv 

^lant atan calabration, ihurstone proposed a 
graphical test of fit An o , t^roposed a 

ff^t. ^ exanple is presented in Panel A Of 
Figure 4 (Thurstone, 1927 n <;-,o. ^ 

' 1927, p. 513) Which shc^ the plot of the 

Insert Figure 4 aix^.t here 
item scale values (sigma values) calibr.^0^ 

and 8 n . ^^l^brated separately in grades 7 

«nd8. According to Thurstone, 

" ^ ^ Fig. 4 CPanel .3 should be distinctly 
non-linear, the present scaling .thod is not applicable. 

l~^"-^-^---tributionsca^t 

" " °^ ^ - xf the Plot is linear, it 

-est^t both distributions .y._,,,,_^^^ 

the same scale or base lino /rm. 
•VH, ■ ^^1^- (^urstone, 1927, p. 513). 

This "test of fit" can =1.. u 

car, als. be presented in th. style 01 the 
sraphacal displays used by Rasch- th- • 

^ in Panel B of 

^ <^^, p. 33, tHe sa. data 

J. ejects o. .in, ^tc.. 

^t "^'^ - - — - 

-aucns o. the ability dis.i.ticns. as c^ed to ^^di.,. 
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shown in Figure 5. In Panel A of Figure 5 (Thurstone, 1927, p. 
506) , the results of usii^ Thomdike's method to calibrate a 



Insert Figure 5 about here 

language scale developed by Trabue (1916) is presented; the average 

language ability increases as a function of grade level, while the 

variances remain constant. The results obtained by using 

Thurstone's niethod are presented in Panel B of Figure 5 (Thurstone, 

1927, p, 515) ; in this figure, average ability increases with grade 

level, but the variances of the scores also increase. These 

results seem theoretically plausible. Thurstone 's method of 

absolute scaling is described and illustrated in detail in 

Engelhard (1984). An "experimental" adjustment for sanple effects 

which occurs with Thtirstone's model for paired comparisons is 

described in Andrich (1978) . 

Thurstone 's method of absolute scaling can also be used to 

scale test scores (Gulliksen, 1950) , but a more interesting 

discussion of issues related to item-invariant measurement is 

presented by Thurstone (1926) in an article on the scoring of 

individual performance. In this article, Thurstone presented a set 

of conditions as follows ; 

1. It shoxild not be required to have the same number of test 
elements at each step of the scale. 
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2. It should be possible to omit several test questions at 
different levels of the scale without affecting the 
individual score. 

3. It should be possible to irclude in the same scale two 
forms of test. 

4. It sliould not be required to submt every subject to the 
whole range of the scale. The startii^ point and terminal 
point, being selected by the examiner, should not directly 
affect the individual score. 

5. It should be possible to use the scale so that a rational 
score may be determined for each individual subject and so 
that the performance of groups of subjects may be conpared. 

6. The arithmetical labor in determining individual scores 
should be a minimum. 

7. The procedure should be as far as possible consistent with 
psychophysical methods so that it will be free from the 
logical errors involved in the Binet scales and its 
variants . 

Conditions one to five clearly shcv? Thurstone's concern with item- 
invariant measurement. In his 1926 paper, he goes on to propose a 
scoring n^thod vAiich meets these conditions; it is beyond the scope 
of this paper to present Thurstone's approach in detail, however, 
it appear^? that he was essentially proposing v^t would be 
recognized today as "person characteristic curves" . 

Many of Thurstone^s articles on scaling are included in The 
measurement of values (1959), althou^ his work on absolute scaling 
is not included in that volume. The technical details and 
elaborations of Thurstonian models are presented in Bock and Jones 
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(1968), and Andrich {1988c) provides a tiseful overview of 
Thurstone's contributions to measurement theory. Althou^ it is 
not directly relevant for this peper, it is interesting to note 
that Thxirstone (1947), as did Rasch (1953), also \ased the concept 
of invariance as an iii^xDrtant aspect of his approach to factor 
analysis, 
Thomdike 

In 1904 r Thomdike published the first edition of his hi^ly 
inf Itaential book entitled An Introctuction to the Theory of Mental 
and Social Measurements . Thomdike 's major aim in writing this 
book \^ to "introduce students to the theory of mental 
measurements and to provide them with such kncwledge and practice 
as may assist them to follow critically quantitative evidence and 
argument and to make their own researches exact and logical (1904, 
p, v) . Thomdike' s book was the standard reference on statistics 
and quantitative methods in the mental and social sciences for the 
first two decides of this century (Clifford, 1984; EngeJhard, 1988; 
Travers, 1983). Much of this influence can be attributed to 
Thomdike 's clear and expository writing style. He ejqplicitly 
acknowledged that contenporary work in measurement theory had not 
been presented in a manner suitable for students without fairly 
advanced mathematical skills, and he set out to present a less 
mathematical introduction to measurement theory based on the belief 
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that "there is, happily, nothing in the gei>eral principles of 
moderi statistical theory but refined connnon sense, and little in 
the techniques resulting from than that general intelligence can 
not readily master" (p. 2). 

Thomdike wrote extensively on educational and psychological 
measurement/ covering topics vAiich ranged from the general 
v^ta cement of his theory (Thomdike, 1904) to the measurement of a 
variety of educational outcomes (Thomdike, 1910, 1914, ^921), as 
well as intelligence (Thomdike, et al., 1926). 

What were the basic measurement problems identified by 
Thomdike? Thomdike clearly stated that the "special 
difficulties" of measurement in the behavioral sciences are 

1 . Absence or imperfection of luiits in vdiich to measure 

2. Lack of constancy in the facts measured 

3. Extx'eme conplexity of the measurements to be made. 

In order to illustrate the problems related to the absence of an 
accepted unit of measurement, Thomdike (1904) pointed out that 
the spelling tests developed by Joseph Mayer Rice did not have 
equal xinits. Rice assumed that all of his spelling words were of 
equal difficulty, vAiile Thomdike argued that the correct spelling 
of an oasy versus a hard word did not reflect equal amounts of 
spelling ability. Becatise the units of measurement are unequal, 
Thomdike asserted that Rice's results were .inaccurate. Without 
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general agreement on units, the m&cning of our test scores become 
more subjective. Within the framework of this paper, Thomdike was 
illustrating that obtained scores may not be invariant over 
subsets of items vdiich vary in difficulty. 

Inconstancy is the second major measurement problem identified 
by Thomdike (1904). f4any of the measurement problems encountered 
in the behavioral sciences are related to random variation inherent 
in hiaman characteristics. Not only are these variations due to the 
unreliability of our tests, but they also reflect within subject 
fluctuations. For exanple, if we measure a person's motivation, or 
even body temperature repeatedly, these values tend to vary. 
Thomdike 's concept of "constancy" come closest to the idea of 
invariance as developed in this paper 

The f iiial measurement problem or "special difficulty" 
identified by Thomdike pertains to the extreme conplexity of the 
variables and constructs that we wish to measure. This problem 
reflects a concern with dimensionality. Most of the variables 
worth measuring in the behavioral sciences do not readily translate 
into unidimensional tests vAiich permit the reporting of a single 
score to represent the individual's location on the latent 
variable or construct of interest. As pointed out by Jones and 
Applebaum (1989), if unidimensionality is obtained for all items 
and over all groups of examinees, then item parameters will be 
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invariant across groups and ability paraineters will be invariant 
across items. Methods for conducting item factor analyses designed 
to e:qc}lore this issue have been summarized by Mislevy (1986) and an 
^)proach to this problem has been illustrated by Muraki and 
Engelhard (1985) • 

Thomdike's method for obtaining sanple-invariant item 
calibration is very similar to Thurstone's method of absolute 
scaling. As described by Thurstone, 

Thomdike's scaling method consists in first determining the 
scale value of each item for each grade separately with the 
mean of each grade as an origin. The difficulty of a test 
item for Grade V children for exanple, is determined by the 
proportion of ri^t ans;*?ers to the test item in that grade. 
When a test item has been scaled in several grades, the scale 
values so obtained will of course be different because of the 
fact that they are e:q)ressed as deviations from different 
grade means as origins. Thomdike then reduces all these 
measurements to a common origin in the construction of an 
educational scale by adding to each scale value the scale 
value of the mean of the grade (Thurstone, 1927, p. 508). 
The major difference between Thomdike 's n^thod of item scaling and 
Thurstone's method of absolute scaling is that Thomdike assumed 
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that the variances of the groups are equal. Thurstone criticized 
this assun^jtion, 

... it is clear that in order to reduce the overlapping 
sentences or test items to a conunon base line or scale it is 
necessary to jnake not one but two adjustments. One of these 
adjustments concerns the means of the several grade groi5)s and 
this adjustment is made by the Thomdike scaling methods. The 
second adjustment mich is not made by Thomdike concerns the 
variation in dispersion of the several groups viien they are 
referred to a common scale (Thurstone, 1927, p. 509). 
The results of using the two different methods were presented 
earlier in Figure 5. In his later work, Thomdike did include an 
adjustment for the range of scores (Thomson, 1940) . 

An e}q>licit statement of Thomdike 's views of item-invariant 
measurement of individuals was not found. Essentially, Thomdike 
recom m e n d e d that tests be constructed with items that are equally 
spaced in terms of their scale values and that the number of items 
ri^t be \ased as a person's score, 

COMPARISON AND DISCUSSION OF THREE MEASUREMENT THEORIES 
A coirparison of the major similarities and differences between 
the measurement theories of Thomdike, Thurstone and Rasch are 
summarized in Table 1. These three measurement theorists were all 

29 



Historical views of invariance 

29 



Insert Table 1 about here 

vjorking within a scaling tradition and based many of their 
proposed methods for calibrating test items and measuring 
individuals on statistical advances made within the field of 
psychophysics. One of the differences between psychophysics and 
psychometrics is that the independent variable is usually an 
observable variable in psychophysics, vAxile in psychomstrics the 
construct is usually uriObservable. Since this construct is not 
directly observable, these three psychometric ians iised the idea of 
.a latent continuum to represent this tmobservable variable. 

Although they all held similar positions on these three 
issues, tliere are also several important differences between 
Thomdike and Thurstone as compared to Rasch. One of the major 
differences is the recognition by Rasch that measurement models can 
and should be developed based on the responses of individuals to 
single test items. This focus on the individual, rather than on 
groups, allowed Rasch to avoid making unnecessary assunptions 
regarding the distribution of abilities vdiich were used by both 
Thomdike and Thurstone. As pointed out earlier, Thomdike 's 
method of scaling test items and Thxirstone's method of absolute 
scaling were both based on the assunption that abilities were 
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normally distributed. By using the individual and not the graup, 
as the level of analysis, Rasch invented a measurement model vdiich 
was capable of providing estimates of the location of both items 
and individuals on a latent variable continuum simultaneously. 
This also allowed Rasch to develop a probabilistic model rather 
than a deterministic model for the probability of each individual 
succeeding on a particular test item as a function of his or her 
ability; this probabilistic relationship is clearly shewn in the 
familiar S-sh^?ed item characteristic curves. Further/ by 
simultaneously including item calibration ai:d individual 
measurement within one model, he was able to derive "conditional" 
estimates of these parameters vAiich provides a framework for 
determining whether or not invariance has been achieved. 

SlM4ARy 

Progress is as difficult to define within the field of 
measurement as in any other field of study (Donovan, Laudan 5 
Laudan, 1988; Laudan, 1977). The analysis presented here suggests 
that Rasch 's work provides a theoretical and statistical framework 
for the practical realization of invariant measurement viiich was 
sou^t by both Thomdike and Thurstone. The simultaneous inclusion 
of both ability and item difficulty within a probabilistic model 
defined at the individual level of analysis provided a general 
framework in which item and person parameters can be estimated 
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separately. Rasch was able to use recent advances in statistics, 
such as the concept of sufficiency cJeveloped by Fisher (1925), to 
propose an approach to measurement vdiich provides practical 
solutions to many testing problems related to invariance. 

This paper is part of a larger program of research related to 
the history and philosophy of measurement theory. The overall 
purposes .of this research are to identify basic measurement 
problems and to describe how these measurement problems are 
addressed by major measurement theorists. As pointed out earlier, 
many of the measurement problems that we face today are not new and 
through the use of historical and conparative perspectives, we can 
gain a better understanding of both the measurement problems 
themselves and the progress v*iich has been made toward the 
solution of these problems. Some of the perennial measurement 
problems in the behavioral sciences can be viewed as part of the 
quest for invariant measurement as described in this paper. 
Another related concept vdiich was not examined here is 
xmidimensionality. A historical and conparative analysis of this 
concept and its development within scaling theory along the lines 
used in this paper would be an iirportant contribution to our 
knowledge of progress in measurement theory. 
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This paper has focused on the concept of invariance as it has 
appeared within the context of measurement theory. Invariance can 
also be vias^ more broadly as the quest for generality in science. 
If we view science in its sinplest form as a series of questions 
and answers, then invariance addresses the problem of vdiether or 
not the answers we find are conparable iver groi^js and methods. 
The concept of invariance within educational and psychological 
research can also be expanded to inclvide first, second and hi^er 
order invar iances. For exanple, invar iances of the first order 
mi^t deal with mean differences between groups on a variable such 
as math anxiety. A second order concern mi^t be viiether or not 
the correlations between mathematics achievement and anxiety are 
invariant over gender, social class and race groups. Higher order 
invar iances mi^t relate to the generalizability of a system of 
inter-relationships between more than tw3 variables. 

There are a number of areas for future research related to the 
manner in viiich the concept of invariance appears within other 
measurement theories that are not within the scaling tradition, but 
derive from the classical test theory tradition. Some illTOStrative 
questions are: How does the work on classical test theory fit into 
tlie quest for invariance? Wasn't Spearman really looking for an 
invariant ranking of individuals regardless of time of 
administration and instrument used? Can the work of Cronbach and 
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others on generalizability theory be viewed as an attenpt to 
identify and examine sources of error variance in test scores which 
are related to the concept of "invariance" in educational and 
psychological tests as presented here? What about invariance 
within the framework of two and three parameter item response 
models? What about Guttman*s research on psychometrics? What are 
the e2q)licit connections of classical measurement concepts , such as 
reliability and validity, to the concept of invariance as presented 
in this paper? How does invariance relate to unidimensionality? 

In summary, the problem of invariance is of fundamental 
inportance for the developnent of meaningful measures in education 
and psychology. Item-invariant estimates of individual abilities 
and sanple-invariant estimates item difficulties are essential in 
order to realize the advantages of objective measurement. The 
conditions for objective measurement correspond to the concept of 
invariance as developed in this p^r; the conditions for 
objective measurement are as follows: 

First, the calibration of measuring instruments must be 
independent of those objects that happen to be used for the 
calibration. Second, the measurement of objects must be 
independent of the instrument that happens to be used for the 
measuring (Wri^t, 1968, p. 87). 
This 'paper provides a historical and substantive review of the 
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problems related to item-invariant measuranent , as well as 
illustrating the progress vduch has been nacte toward solving 
measurement problems related to invariance. Further, this paper 
contributes to an appreciation of Rasch's acconplishments and the 
elegance of his approach to problems related to item-invariant 
measijurement. As pointed out by Andrich (1988b), Rasch's 
achievement did not occur in a "historical vacuum" (p. 13) and this 
paper illustrates how two major measurement theorists, Thomdike 
and Thurstone, addressed issues vAiich were eventually resolved by 
Rasch. 
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Table 1 

Comparison of Itomdlke, Thurstone and Rasch on Major Issues 

Issue Thomdike Thurstone Rasch 

Applied psychophysical 

methods to address measurement Yes Yes Yes 

problems (Scaling tradition) 

Utilized latent variables Yes Yes Yes 



Recognized the inportance Yes Yes Yes 

of invai'iance 

Measurement of individuals 

and calibration of items No No Yes 



addressed sLmultaneously 



Developed models for individual No No Yes 

responses to test items 
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Figure 1 

Rasch^s approach to sample-invariant calibration 
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Figure 2 

Rasch's approach to item-invariant measurement 
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Figure 3 

Thurstone's approach to sample-invariant item calibration 
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A. Fixed location of one itam (open circle) regardless of group (A & B) 




B. Fixed location of seven items (a to g) regardless of group (A & B) 
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Figure 4 

Examining sample-invariant item calibrations 
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B, Rasch's. control of the model 
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Figure 5 

Distribution of language ability in grade 2 to 12 (Trabue's 1916 data) 
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